Skip to content

FIX: Preserve DatasetConfiguration subclass when backend overrides dataset_names#1911

Merged
varunj-msft merged 2 commits into
microsoft:mainfrom
varunj-msft:varunj-msft/8380-Standardizing-Scenarios-type-preservation-fix
Jun 4, 2026
Merged

FIX: Preserve DatasetConfiguration subclass when backend overrides dataset_names#1911
varunj-msft merged 2 commits into
microsoft:mainfrom
varunj-msft:varunj-msft/8380-Standardizing-Scenarios-type-preservation-fix

Conversation

@varunj-msft
Copy link
Copy Markdown
Contributor

Description

When dataset_names is passed through the backend (ScenarioRunService._build_init_kwargs), we used to always construct a plain DatasetConfiguration. That silently dropped subclass-specific behavior — most notably EncodingDatasetConfiguration.get_all_seed_attack_groups(), which shapes each seed into a SeedAttackGroup with a synthetic objective.

For garak.encoding this surfaced as a confusing runtime error during attack construction:

ValueError: SeedAttackGroup must have exactly one objective. Found 0.

Reproducible end-to-end against the real garak_slur_terms_en dataset.

Fix: when dataset_names is supplied, build a fresh instance of the scenario's own default-dataset-config class so subclass overrides are preserved. If a future subclass adds required init kwargs we can't populate, fall back to the plain DatasetConfiguration with a logged warning so the operator has a trail.

The max_dataset_size-only path is unchanged — it still mutates the throwaway introspection instance's default config.

First in a series of small PRs for the Standardizing Scenarios work . Lands ahead of the Encoding scenario standardization PR, which depends on this fix to make the documented fast path usable via the API.

Tests and Documentation

  • 5 new regression tests in tests/unit/backend/test_scenario_run_service.py covering: subclass preservation with dataset_names, with dataset_names + max_dataset_size, with dataset_names only (no max), fallback to plain DatasetConfiguration when subclass init is incompatible (+ caplog assertion on the warning), and the introspection-failure path.
  • All 5 new tests fail against pre-fix code; verified by reverting the prod change and rerunning.
  • All 30 pre-existing tests in the file still pass.
  • Full backend suite: 619 passed, 4 skipped.
  • Full scenario suite: 624 passed.
  • ruff check + ruff format --check + ty all clean on both touched files.
  • No JupyText / notebook changes (backend service fix, no doc impact).

…es dataset_names

ScenarioRunService._build_init_kwargs() used to construct a plain
DatasetConfiguration whenever the caller passed dataset_names. This
silently lost subclass-specific behavior such as
EncodingDatasetConfiguration.get_all_seed_attack_groups(), which
shapes each seed into a SeedAttackGroup with a synthetic objective.

The downstream symptom for the Encoding scenario was:
  ValueError: SeedAttackGroup must have exactly one objective. Found 0.

raised during attack construction. Reproducible end-to-end against the
real garak_slur_terms_en dataset.

Fix: when dataset_names is supplied, construct a fresh instance of the
scenario's own default-dataset-config class so subclass overrides are
preserved. Fall back to the plain DatasetConfiguration (with a logged
warning) if a future subclass adds required __init__ kwargs we cannot
populate.

The max_dataset_size-only path keeps reusing-and-mutating the throwaway
introspection instance's default config (no behavior change).

Tests:
- 5 new regression tests, all of which fail against pre-fix code.
- All 30 existing tests still pass.
- Full backend suite: 619 passed, 4 skipped.
- Full scenario suite: 624 passed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread pyrit/backend/services/scenario_run_service.py Outdated
Comment thread pyrit/backend/services/scenario_run_service.py Outdated
Comment thread pyrit/backend/services/scenario_run_service.py Outdated
Copy link
Copy Markdown
Contributor

@hannahwestra25 hannahwestra25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nits but lgtm! thanks for fixing

@varunj-msft varunj-msft enabled auto-merge June 4, 2026 17:27
@varunj-msft varunj-msft added this pull request to the merge queue Jun 4, 2026
Merged via the queue into microsoft:main with commit 01f4abc Jun 4, 2026
52 checks passed
@varunj-msft varunj-msft deleted the varunj-msft/8380-Standardizing-Scenarios-type-preservation-fix branch June 4, 2026 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants